113 research outputs found

    Neuromorphic Acceleration for Approximate Bayesian Inference on Neural Networks via Permanent Dropout

    Full text link
    As neural networks have begun performing increasingly critical tasks for society, ranging from driving cars to identifying candidates for drug development, the value of their ability to perform uncertainty quantification (UQ) in their predictions has risen commensurately. Permanent dropout, a popular method for neural network UQ, involves injecting stochasticity into the inference phase of the model and creating many predictions for each of the test data. This shifts the computational and energy burden of deep neural networks from the training phase to the inference phase. Recent work has demonstrated near-lossless conversion of classical deep neural networks to their spiking counterparts. We use these results to demonstrate the feasibility of conducting the inference phase with permanent dropout on spiking neural networks, mitigating the technique's computational and energy burden, which is essential for its use at scale or on edge platforms. We demonstrate the proposed approach via the Nengo spiking neural simulator on a combination drug therapy dataset for cancer treatment, where UQ is critical. Our results indicate that the spiking approximation gives a predictive distribution practically indistinguishable from that given by the classical network.Comment: 4 pages, 4 figures. Submitted to International Conference on Neuromorphic Systems (ICONS) 201

    A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

    Full text link
    The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs constraints, is based on convex linear programming. In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers. Next, we propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints: robot navigation in a grid world and solar-powered unmanned aerial vehicle (UAV)-based wireless network management. We empirically compare the convergence performance of the proposed GAS algorithm with binary search (BS), Lagrangian primal-dual optimization (PDO), and Linear Programming (LP). Compared with benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution faster, does not require hyper-parameter tuning, and is not sensitive to initialization of the Lagrange penalty multiplier.Comment: Submitted as a brief paper to the IEEE TNNL

    Meta Continual Learning via Dynamic Programming

    Full text link
    Meta-continual learning algorithms seek to rapidly train a model when faced with similar tasks sampled sequentially from a task distribution. Although impressive strides have been made in this area, there is no theoretical framework that enables systematic analysis of key learning challenges, such as generalization and catastrophic forgetting. We introduce a new theoretical framework for meta-continual learning using dynamic programming, analyze generalization and catastrophic forgetting, and establish conditions of optimality. We show that existing meta-continual learning methods can be derived from the proposed dynamic programming framework. Moreover, we develop a new dynamic-programming-based meta-continual approach that adopts stochastic-gradient-driven alternating optimization method. We show that, on meta-continual learning benchmark data sets, our theoretically grounded meta-continual learning approach is better than or comparable to the purely empirical strategies adopted by the existing state-of-the-art methods

    Graph Neural Network Architecture Search for Molecular Property Prediction

    Full text link
    Predicting the properties of a molecule from its structure is a challenging task. Recently, deep learning methods have improved the state of the art for this task because of their ability to learn useful features from the given data. By treating molecule structure as graphs, where atoms and bonds are modeled as nodes and edges, graph neural networks (GNNs) have been widely used to predict molecular properties. However, the design and development of GNNs for a given data set rely on labor-intensive design and tuning of the network architectures. Neural architecture search (NAS) is a promising approach to discover high-performing neural network architectures automatically. To that end, we develop an NAS approach to automate the design and development of GNNs for molecular property prediction. Specifically, we focus on automated development of message-passing neural networks (MPNNs) to predict the molecular properties of small molecules in quantum mechanics and physical chemistry data sets from the MoleculeNet benchmark. We demonstrate the superiority of the automatically discovered MPNNs by comparing them with manually designed GNNs from the MoleculeNet benchmark. We study the relative importance of the choices in the MPNN search space, demonstrating that customizing the architecture is critical to enhancing performance in molecular property prediction and that the proposed approach can perform customization automatically with minimal manual effort

    MaLTESE: Large-Scale Simulation-Driven Machine Learning for Transient Driving Cycles

    Full text link
    Optimal engine operation during a transient driving cycle is the key to achieving greater fuel economy, engine efficiency, and reduced emissions. In order to achieve continuously optimal engine operation, engine calibration methods use a combination of static correlations obtained from dynamometer tests for steady-state operating points and road and/or track performance data. As the parameter space of control variables, design variable constraints, and objective functions increases, the cost and duration for optimal calibration become prohibitively large. In order to reduce the number of dynamometer tests required for calibrating modern engines, a large-scale simulation-driven machine learning approach is presented in this work. A parallel, fast, robust, physics-based reduced-order engine simulator is used to obtain performance and emission characteristics of engines over a wide range of control parameters under various transient driving conditions (drive cycles). We scale the simulation up to 3,906 nodes of the Theta supercomputer at the Argonne Leadership Computing Facility to generate data required to train a machine learning model. The trained model is then used to predict various engine parameters of interest. Our results show that a deep-neural-network-based surrogate model achieves high accuracy for various engine parameters such as exhaust temperature, exhaust pressure, nitric oxide, and engine torque. Once trained, the deep-neural-network-based surrogate model is fast for inference: it requires about 16 micro sec for predicting the engine performance and emissions for a single design configuration compared with about 0.5 s per configuration with the engine simulator. Moreover, we demonstrate that transfer learning and retraining can be leveraged to incrementally retrain the surrogate model to cope with new configurations that fall outside the training data space

    Towards On-Chip Bayesian Neuromorphic Learning

    Full text link
    If edge devices are to be deployed to critical applications where their decisions could have serious financial, political, or public-health consequences, they will need a way to signal when they are not sure how to react to their environment. For instance, a lost delivery drone could make its way back to a distribution center or contact the client if it is confused about how exactly to make its delivery, rather than taking the action which is "most likely" correct. This issue is compounded for health care or military applications. However, the brain-realistic temporal credit assignment problem neuromorphic computing algorithms have to solve is difficult. The double role weights play in backpropagation-based-learning, dictating how the network reacts to both input and feedback, needs to be decoupled. e-prop 1 is a promising learning algorithm that tackles this with Broadcast Alignment (a technique where network weights are replaced with random weights during feedback) and accumulated local information. We investigate under what conditions the Bayesian loss term can be expressed in a similar fashion, proposing an algorithm that can be computed with only local information as well and which is thus no more difficult to implement on hardware. This algorithm is exhibited on a store-recall problem, which suggests that it can learn good uncertainty on decisions to be made over time

    Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders

    Full text link
    A common strategy for the dimensionality reduction of nonlinear partial differential equations relies on the use of the proper orthogonal decomposition (POD) to identify a reduced subspace and the Galerkin projection for evolving dynamics in this reduced space. However, advection-dominated PDEs are represented poorly by this methodology since the process of truncation discards important interactions between higher-order modes during time evolution. In this study, we demonstrate that an encoding using convolutional autoencoders (CAEs) followed by a reduced-space time evolution by recurrent neural networks overcomes this limitation effectively. We demonstrate that a truncated system of only two latent-space dimensions can reproduce a sharp advecting shock profile for the viscous Burgers equation with very low viscosities, and a six-dimensional latent space can recreate the evolution of the inviscid shallow water equations. Additionally, the proposed framework is extended to a parametric reduced-order model by directly embedding parametric information into the latent space to detect trends in system evolution. Our results show that these advection-dominated systems are more amenable to low-dimensional encoding and time evolution by a CAE and recurrent neural network combination than the POD Galerkin technique

    Non-autoregressive time-series methods for stable parametric reduced-order models

    Full text link
    Advection-dominated dynamical systems, characterized by partial differential equations, are found in applications ranging from weather forecasting to engineering design where accuracy and robustness are crucial. There has been significant interest in the use of techniques borrowed from machine learning to reduce the computational expense and/or improve the accuracy of predictions for these systems. These rely on the identification of a basis that reduces the dimensionality of the problem and the subsequent use of time series and sequential learning methods to forecast the evolution of the reduced state. Often, however, machine-learned predictions after reduced-basis projection are plagued by issues of stability stemming from incomplete capture of multiscale processes as well as due to error growth for long forecast durations. To address these issues, we have developed a \emph{non-autoregressive} time series approach for predicting linear reduced-basis time histories of forward models. In particular, we demonstrate that non-autoregressive counterparts of sequential learning methods such as long short-term memory (LSTM) considerably improve the stability of machine-learned reduced-order models. We evaluate our approach on the inviscid shallow water equations and show that a non-autoregressive variant of the standard LSTM approach that is bidirectional in the PCA components obtains the best accuracy for recreating the nonlinear dynamics of partial observations. Moreover---and critical for many applications of these surrogates---inference times are reduced by three orders of magnitude using our approach, compared with both the equation-based Galerkin projection method and the standard LSTM approach

    Deep-Ensemble-Based Uncertainty Quantification in Spatiotemporal Graph Neural Networks for Traffic Forecasting

    Full text link
    Deep-learning-based data-driven forecasting methods have produced impressive results for traffic forecasting. A major limitation of these methods, however, is that they provide forecasts without estimates of uncertainty, which are critical for real-time deployments. We focus on a diffusion convolutional recurrent neural network (DCRNN), a state-of-the-art method for short-term traffic forecasting. We develop a scalable deep ensemble approach to quantify uncertainties for DCRNN. Our approach uses a scalable Bayesian optimization method to perform hyperparameter optimization, selects a set of high-performing configurations, fits a generative model to capture the joint distributions of the hyperparameter configurations, and trains an ensemble of models by sampling a new set of hyperparameter configurations from the generative model. We demonstrate the efficacy of the proposed methods by comparing them with other uncertainty estimation techniques. We show that our generic and scalable approach outperforms the current state-of-the-art Bayesian and a number of other commonly used frequentist techniques

    Site-specific graph neural network for predicting protonation energy of oxygenate molecules

    Full text link
    Bio-oil molecule assessment is essential for the sustainable development of chemicals and transportation fuels. These oxygenated molecules have adequate carbon, hydrogen, and oxygen atoms that can be used for developing new value-added molecules (chemicals or transportation fuels). One motivation for our study stems from the fact that a liquid phase upgrading using mineral acid is a cost-effective chemical transformation. In this chemical upgrading process, adding a proton (positively charged atomic hydrogen) to an oxygen atom is a central step. The protonation energies of oxygen atoms in a molecule determine the thermodynamic feasibility of the reaction and likely chemical reaction pathway. A quantum chemical model based on coupled cluster theory is used to compute accurate thermochemical properties such as the protonation energies of oxygen atoms and the feasibility of protonation-based chemical transformations. However, this method is too computationally expensive to explore a large space of chemical transformations. We develop a graph neural network approach for predicting protonation energies of oxygen atoms of hundreds of bioxygenate molecules to predict the feasibility of aqueous acidic reactions. Our approach relies on an iterative local nonlinear embedding that gradually leads to global influence of distant atoms and a output layer that predicts the protonation energy. Our approach is geared to site-specific predictions for individual oxygen atoms of a molecule in comparison with commonly used graph convolutional networks that focus on a singular molecular property prediction. We demonstrate that our approach is effective in learning the location and magnitudes of protonation energies of oxygenated molecules
    • …
    corecore